Language Models for Machine Translation: Original vs. Translated Texts
نویسندگان
چکیده
We investigate the differences between language models compiled from original target-language texts and those compiled from texts translated to the target language. Corroborating established observations of Translation Studies, we demonstrate that the latter are significantly better predictors of translated sentences than the former, and hence fit the reference set better. Furthermore, translated texts yield better language models for statistical machine translation than original texts. Statistical machine translation (MT) uses large target language models (LMs) to improve the fluency of generated texts, and it is commonly assumed that for the construction of language models, “more data is better data” (Brants and Xu, 2009). Not all data, however, are created the same. In this work we distinguish between LMs compiled from texts originally written in the target language and LMs compiled from translated texts. The motivation for our work stems from much research in Translation Studies that establishes the fact that original texts are significantly different from translated ones in various aspects (Gellerstam, 1986). Recently, corpus-based computational analysis corroborated this observation, and Kurokawa et al. (2009) apply it to statistical machine translation, showing that it is better to train a translation model for an English-to-French MT system on an English-translated-to-French parallel corpus and vice-versa. Our research question is whether a language model compiled from translated texts may improve the results of the translation further. We test this hypothesis on three different translation tasks: Hebrew-to-English, German-toEnglish and French-to-English. First, for each language pair we build two English language models from two types of corpora: texts originally written in English, and human translations from the source language into English. We show that for each language pair, the latter language model better fits a set of reference translations in terms of perplexity. In other words, LMs can successfully distinguish between original and translated texts. Moreover, we demonstrate that the differences between the two LMs are not biased by content but rather reflect differences on abstract linguistic features. Research in Translation Studies indicates that certain translation universals exist which cause translated texts from several languages to a single target language to resemble each other along various axes (Baker, 1993, 1995, 1996). To test this hypothesis, we compile additional English LMs, this time using texts translated to English from languages other than the source. Again, we use perplexity to assess the fit of these LMs to reference sets of source-language-translated-to-English sentences. We show that these LMs depend on the source language and differ from each other. They outperform the original-based LMs, but the LMs compiled from texts that were translated from the source language still fit the reference set best. Finally, we train a phrase-based MT system (Koehn et al., 2003) for each language pair. We use parallel corpora comprising components translated from source and target languages. We use four LMs: one original, one translated from the source language, one translated from another language and one (baseline) oblivious to the source language. We show that the translated-from-sourcelanguage LMs provide a siginificant improvement in the quality of the translation output over all other LMs. The main findings of this work, therefore, are that original and translated texts indeed exhibit significant, measurable differences, and LMs compiled from translated texts better fit translated references than LMs compiled from original texts (and, to a lesser extent, LMs compiled from texts translated from languages other than the source language). These differences yield significant improvement in the quality of MT systems that use LMs compiled from translated texts.
منابع مشابه
The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملTranslationese: Between Human and Machine Translation
Translated texts, in any language, have unique characteristics that set them apart from texts originally written in the same language. Translation Studies is a research field that focuses on investigating these characteristics. Until recently, research in machine translation (MT) has been entirely divorced from translation studies. The main goal of this tutorial is to introduce some of the find...
متن کاملStatistical Machine Translation with Automatic Identification of Translationese
Translated texts (in any language) are so markedly different from original ones that text classification techniques can be used to tease them apart. Previous work has shown that awareness to these differences can significantly improve statistical machine translation. These results, however, required meta-information on the ontological status of texts (original or translated) which is typically ...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملThe Back-translation Score: Automatic MT Evaluation at the Sentence Level without Reference Translations
Automatic tools for machine translation (MT) evaluation such as BLEU are well established, but have the drawbacks that they do not perform well at the sentence level and that they presuppose manually translated reference texts. Assuming that the MT system to be evaluated can deal with both directions of a language pair, in this research we suggest to conduct automatic MT evaluation by determini...
متن کامل